table of contents
STACD.CONF(5) | STACD.CONF(5) |
NAME¶
stacd.conf - stacd(8) configuration file
SYNOPSIS¶
/etc/stas/stacd.conf
DESCRIPTION¶
When stacd(8) starts up, it reads its configuration from stacd.conf.
CONFIGURATION FILE FORMAT¶
stacd.conf is a plain text file divided into sections, with configuration entries in the style key=value. Spaces immediately before or after the = are ignored. Empty lines are ignored as well as lines starting with #, which may be used for commenting.
OPTIONS¶
[Global] section¶
The following options are available in the [Global] section:
tron=
hdr-digest=
data-digest=
kato=
ip-family=
Choices are ipv4, ipv6, or ipv4+ipv6.
Defaults to ipv4+ipv6.
nr-io-queues=
Note: This parameter is identical to that provided by nvme-cli.
Default: Depends on kernel and other run time factors (e.g. number of CPUs).
nr-write-queues=
Note: This parameter is identical to that provided by nvme-cli.
Default: Depends on kernel and other run time factors (e.g. number of CPUs).
nr-poll-queues=
Note: This parameter is identical to that provided by nvme-cli.
Default: Depends on kernel and other run time factors (e.g. number of CPUs).
queue-size=
Overrides the default number of elements in the I/O queues created by the driver. This option will be ignored for discovery, but will be passed on to the subsequent connect call.
Note: This parameter is identical to that provided by nvme-cli.
Defaults to 128.
reconnect-delay=
Overrides the default delay before reconnect is attempted after a connect loss.
Note: This parameter is identical to that provided by nvme-cli.
Defaults to 10. Retry to connect every 10 seconds.
ctrl-loss-tmo=
Overrides the default controller loss timeout period (in seconds).
Note: This parameter is identical to that provided by nvme-cli.
Defaults to 600 seconds (10 minutes).
disable-sqflow=
Note: This parameter is identical to that provided by nvme-cli.
Defaults to false.
ignore-iface=
There is no guarantee that there will be a route to reach that IOC. However, we can use the socket option SO_BINDTODEVICE to force the connection to be made on a specific interface instead of letting the routing tables decide where to make the connection.
This option determines whether stacd will use SO_BINDTODEVICE to force connections on an interface or just rely on the routing tables. The default is to use SO_BINDTODEVICE, in other words, stacd does not ignore the interface.
BACKGROUND: By default, stacd will connect to IOCs on the same interface that was used to retrieve the discovery log pages. If stafd discovers a DC on an interface using mDNS, and stafd connects to that DC and retrieves the log pages, it is expected that the storage subsystems listed in the log pages are reachable on the same interface where the DC was discovered.
For example, let's say a DC is discovered on interface ens102. Then all the subsystems listed in the log pages retrieved from that DC must be reachable on interface ens102. If this doesn't work, for example you cannot "ping -I ens102 [storage-ip]", then the most likely explanation is that proxy arp is not enabled on the switch that the host is connected to on interface ens102. Whatever you do, resist the temptation to manually set up the routing tables or to add alternate routes going over a different interface than the one where the DC is located. That simply won't work. Make sure proxy arp is enabled on the switch first.
Setting routes won't work because, by default, stacd uses the SO_BINDTODEVICE socket option when it connects to IOCs. This option is used to force a socket connection to be made on a specific interface instead of letting the routing tables decide where to connect the socket. Even if you were to manually configure an alternate route on a different interface, the connections (i.e. host to IOC) will still be made on the interface where the DC was discovered by stafd.
Defaults to false.
[I/O controller connection management] section¶
Connectivity between hosts and subsystems in a fabric is controlled by Fabric Zoning. Entities that share a common zone (i.e., are zoned together) are allowed to discover each other and establish connections between them. Fabric Zoning is configured on Discovery Controllers (DC). Users can add/remove controllers and/or hosts to/from zones.
Hosts have no direct knowledge of the Fabric Zoning configuration that is active on a given DC. As a result, if a host is impacted by a Fabric Zoning configuration change, it will be notified of the connectivity configuration change by the DC via Asynchronous Event Notifications (AEN).
Table 1. List of terms used in this section:
Term | Description |
AEN | Asynchronous Event Notification. A CQE (Completion Queue Entry) for an Asynchronous Event Request that was previously transmitted by the host to a Discovery Controller. AENs are used by DCs to notify hosts that a change (e.g., a connectivity configuration change) has occurred. |
DC | Discovery Controller. |
DLP | Discovery Log Page. A host will issue a Get Log Page command to retrieve the list of controllers it may connect to. |
DLPE | Discovery Log Page Entry. The response to a Get Log Page command contains a list of DLPEs identifying each controller that the host is allowed to connect with. Note that DLPEs may contain both I/O Controllers (IOCs) and Discovery Controllers (DCs). DCs listed in DLPEs are called referrals. stacd only deals with IOCs. Referrals (DCs) are handled by stafd. |
IOC | I/O Controller. |
Manual Config | Refers to manually adding entries to stacd.conf with the controller= parameter. |
Automatic Config | Refers to receiving configuration from a DC as DLPEs |
External Config | Refers to configuration done outside of the nvme-stas framework, for example using nvme-cli commands |
DCs notify hosts of connectivity configuration changes by sending
AENs indicating a "Discovery Log" change. The host uses these AENs
as a trigger to issue a Get Log Page command. The response to this command
is used to update the list of DLPEs containing the controllers the host is
allowed to access. Upon reception of the current DLPEs, the host will
determine whether DLPEs were added and/or removed, which will trigger the
addition and/or removal of controller connections. This happens in real time
and may affect active connections to controllers including controllers that
support I/O operations (IOCs). A host that was previously connected to an
IOC may suddenly be told that it is no longer allowed to connect to that IOC
and should disconnect from it.
IOC connection creation. There are 3 ways to configure IOC connections on a host:
IOC connection removal/prevention. There are 3 ways to remove (or prevent) connections to an IOC:
The decision by the host to automatically disconnect from an IOC following connectivity configuration changes is controlled by two parameters: disconnect-scope and disconnect-trtypes.
disconnect-scope=
In theory, hosts should only connect to IOCs that have been zoned for them. Connections to IOCs that a host is not zoned to have access to should simply not exist. In practice, however, users may not want hosts to disconnect from all IOCs in reaction to connectivity configuration changes (or at least for some of the IOC connections).
Some users may prefer for IOC connections to be "sticky" and only be removed manually (nvme-cli or exclude=) or removed by a system reboot. Specifically, they don't want IOC connections to be removed unexpectedly on DLPE removal. These users may want to set disconnect-scope to no-disconnect.
It is important to note that when IOC connections are removed, ongoing I/O transactions will be terminated immediately. There is no way to tell what happens to the data being exchanged when such an abrupt termination happens. If a host was in the middle of writing to a storage subsystem, there is a chance that outstanding I/O operations may not successfully complete.
Values:
only-stas-connections
In this mode, when a DLPE is removed as a result of connectivity configuration changes, the corresponding IOC connection will be removed by stacd.
Connections to IOCs made externally, e.g. using nvme-cli, will not be affected, unless they happen to be duplicates of connections made by stacd. It's simply not possible for stacd to tell that a connection was previously made with nvme-cli (or any other external tool). So, it's good practice to avoid duplicating configuration between stacd and external tools.
Users wanting to persist some of their IOC connections regardless of connectivity configuration changes should not use nvme-cli to make those connections. Instead, they should hard-code them in stacd.conf with the controller= parameter. Using the controller= parameter is the only way for a user to tell stacd that a connection must be made and not be deleted "no-matter-what".
all-connections-matching-disconnect-trtypes
In this mode, as DLPEs are removed as a result of connectivity configuration changes, the corresponding IOC connections will be removed by the host immediately whether they were made by stacd, nvme-cli, or any other way. Basically, stacd audits all IOC connections matching the transport type specified by disconnect-trtypes=.
NOTE. This mode implies that stacd will only allow Manually Configured or Automatically Configured IOC connections to exist. Externally Configured connections using nvme-cli (or other external mechanism) that do not match any Manual Config (stacd.conf) or Automatic Config (DLPEs) will get deleted immediately by stacd.
no-disconnect
Instead, users can remove connections by issuing the nvme-cli command "nvme disconnect", add an exclude= entry to stacd.conf, or wait until the next system reboot at which time all connections will be removed.
disconnect-trtypes=
Can take the values tcp, rdma, fc, or a combination thereof by separating them with a plus (+) sign. For example: tcp+fc. No spaces are allowed between values and the plus (+) sign.
Values:
tcp
rdma
fc
connect-attempts-on-ncc=
If a host is currently failing to connect to an I/O controller and if the NCC bit associated with that I/O controller is asserted, the host can decide to stop trying to connect to that subsystem until connectivity is restored. This will be indicated by the CDC when it clears the NCC bit.
The parameter connect-attempts-on-ncc= controls whether stacd will take the NCC bit into account when attempting to connect to an I/O Controller. Setting connect-attempts-on-ncc= to 0 means that stacd will ignore the NCC bit and will keep trying to connect. Setting connect-attempts-on-ncc= to a non-zero value indicates the number of connection attempts that will be made before stacd gives up trying. Note that this value should be set to a value greater than 1. In fact, when set to 1, stacd will automatically use 2 instead. The reason for this is simple. It is possible that a first connect attempt may fail.
Defaults to 0.
[Controllers] section¶
The following options are available in the [Controllers] section:
controller=
controller=transport=[trtype];traddr=[traddr];trsvcid=[trsvcid];host-traddr=[traddr],host-iface=[iface];nqn=[nqn]
Fields
transport=
Table 2. Transport type
trtype | Definition |
rdma | The network fabric is an rdma network (RoCE, iWARP, Infiniband, basic rdma, etc) |
fc | The network fabric is a Fibre Channel network. |
tcp | The network fabric is a TCP/IP network. |
loop | Connect to a NVMe over Fabrics target on the local host |
traddr=
trsvcid=
Depending on the transport type, this field will default to either 8009 or 4420 as follows.
UDP port 4420 and TCP port 4420 have been assigned by IANA for use by NVMe over Fabrics. NVMe/RoCEv2 controllers use UDP port 4420 by default. NVMe/iWARP controllers use TCP port 4420 by default.
TCP port 4420 has been assigned for use by NVMe over Fabrics and TCP port 8009 has been assigned by IANA for use by NVMe over Fabrics discovery. TCP port 8009 is the default TCP port for NVMe/TCP discovery controllers. There is no default TCP port for NVMe/TCP I/O controllers, the Transport Service Identifier (TRSVCID) field in the Discovery Log Entry indicates the TCP port to use.
The TCP ports that may be used for NVMe/TCP I/O controllers include TCP port 4420, and the Dynamic and/or Private TCP ports (i.e., ports in the TCP port number range from 49152 to 65535). NVMe/TCP I/O controllers should not use TCP port 8009. TCP port 4420 shall not be used for both NVMe/iWARP and NVMe/TCP at the same IP address on the same network.
Ref: IANA Service names port numbers[1]
nqn=
This field is mandatory for I/O Controllers, but is optional for Discovery Controllers (DC). For the latter, the NQN will default to the well-known DC NQN: nqn.2014-08.org.nvmexpress.discovery if left undefined.
host-traddr=
host-iface=
dhchap-ctrl-secret=
hdr-digest=
data-digest=
nr-io-queues=
nr-write-queues=
nr-poll-queues=
queue-size=
kato=
reconnect-delay=
ctrl-loss-tmo=
disable-sqflow=
controller = transport=tcp;traddr=localhost;trsvcid=8009 controller = transport=tcp;traddr=2001:db8::370:7334;host-iface=enp0s8 controller = transport=fc;traddr=nn-0x204600a098cbcac6:pn-0x204700a098cbcac6
exclude=
The syntax is the same as for "controller", except that only transport, traddr, trsvcid, nqn, and host-iface apply. Multiple exclude= keywords may appear in the config file to specify more than 1 excluded controller.
Note 1: A minimal match approach is used to eliminate unwanted controllers. That is, you do not need to specify all the parameters to identify a controller. Just specifying the host-iface, for example, can be used to exclude all controllers on an interface.
Note 2: exclude= takes precedence over controller. A controller specified by the controller keyword, can be eliminated by the exclude= keyword.
Examples:
exclude = transport=tcp;traddr=fe80::2c6e:dee7:857:26bb # Eliminate a specific address exclude = host-iface=enp0s8 # Eliminate everything on this interface
SEE ALSO¶
NOTES¶
- 1.
- IANA Service names port numbers
nvme-stas 2.2.1 |